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Abstract 

Lyapunov drift and Lyapunov optimization are powerful techniques for optimizing time averages in stochastic 
queueing networks subject to stability. However, there are various definitions of queue stability in the literature, 
. and the most convenient Lyapunov drift conditions often provide stability and performance bounds only in terms 

of a time average expectation, rather than a pure time average. We extend the theory to show that for quadratic 
Lyapunov functions, the basic drift condition, together with a mild bounded fourth moment condition, implies all 
major forms of stability. Further, we show that the basic drift-plus-penalty condition implies that the same bounds 
f — . for queue backlog and penalty expenditure that are known to hold for time average expectations also hold for pure 

£N) ' time averages with probability 1 . Our analysis combines Lyapunov drift theory with the Kolmogorov law of large 

numbers for martingale differences with finite variance. 
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I. Introduction 



Lyapunov optimization is a powerful technique for optimizing time averages in stochastic queueing 
networks (see [1]-[13]). Work in [1] presents a drift-plus-penalty theorem that provides a methodology 
for designing control algorithms to maximize time average network utility subject to queue stability. 
The theorem also provides explicit performance tradeoffs between utility maximization and average 
queue backlog. Example applications include maximizing network throughput subject to average power 
constraints, minimizing average power expenditure subject to network stability, and maximizing network 
throughput-utility subject to network stability [l]-[5]. The drift-plus -penalty theorem provides performance 
bounds in terms of time average expectations. Time average expectations are the same as pure time 
averages (with probability 1) in certain cases, such as when the system evolves on an irreducible and 
positive recurrent Markov chain with a finite or countably infinite state space (and when some additional 
^ ■ mild assumptions are satisfied). However, many systems have an uncountably infinite state space and/or do 
not have the required Markov structure. It is not clear if pure time averages satisfy the same guarantees in 
general. This paper proves a sample path version of the drift-plus -penalty theorem, showing that if fourth 
moment boundedness conditions are satisfied, then the same guarantees hold for pure time averages with 
probability 1. 

To understand this result and the systems it can be applied to, we consider a stochastic queueing 
network that evolves in discrete time with unit timeslots t E {0, 1, 2, . . .}. Suppose there are K queues, 
and let Q(t) = (Qi(t), . . . , Qk^)) represent the vector of current queue backlogs. Random events, such 
as random channel conditions and packet arrivals, can take place every slot. A network controller reacts 
to the random events by choosing a control action every slot. The control action affects queue arrival and 
service variables on slot t, and also incurs a vector of penalties y(t) = (yo(t), yi(t), . . . , The goal 
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is to stabilize all network queues while minimizing the time average of yo(t) subject to the time averages 



of ym(f) being less than or equal to 0: 

Minimize: y (1) 

Subject to: (1) y m < Vm G {1, . . . , M} (2) 

(2) All queues are stable (3) 



Assuming that the problem is feasible and that a certain drift-plus-penalty condition is met, the existing 
drift-plus-penalty theory in [1] can solve this problem by specifying a class of algorithms, parameterized 
by a constant V > chosen as desired, to yield: 

1 * _1 

limsup-^E{y (r)} < y* + O(l/V) (4) 

T=U 
1 ^ 

limsup- VE{?/ m (r)} < Vm 6 {1, ... , M} (5) 

f->oo t * 

T=0 

limsup-^^E{|Q fc (r)|} < O(V) VA; G {1, . . . , K} (6) 

t— >oo t 

T = K = l 

where t/q is the infimum time average of yo(t) over all algorithms that can satisfy the desired constraints. 
The guarantee © implies that the limsup time average expected queue backlog is finite for all queues, 
and is a condition often called strong stability. The above bounds say that the time average constraints 
y m < are satisfied for all m G {1, . . . , M} in a time average expected sense, that all queues Qk{t) 
are strongly stable with average backlog 0(V), and time average expected penalty is within 0(1/ V) 
of optimality. The 0(1 /V) penalty gap can be made arbitrarily small by choosing a suitably large V 
parameter, at the expense of increasing the average backlog bound linearly with V. 
We would like to know when we can also claim that: 

< y* + 0(l/V) (w.p.l) (7) 

< (w.p.l) Vme{l,...,M} (8) 

< 0(V) (w.p.l) VA; G {!,..., K} (9) 

where "w.p.l" stands for "with probability 1." This paper shows that CD)-© hold if a similar drift-plus- 
penalty condition holds, and additionally if the y m (t) penalties and the one-slot changes in queue backlogs 
have conditionally bounded fourth moments given the past. 

We note that related problems of minimizing convex functions of time averages, rather than minimizing 
time averages themselves, can be transformed into problems of the type ©-© using a technique of 
auxiliary variables [3][1][8][14]. Hence, these extended problems can also be treated using the framework 
of this paper. However, for brevity we limit attention to problems of the type ([I])-®. 

A. On relationships between time average expectations and time averages 

It is known by Fatou's Lemma that if a random process is deterministically lower-bounded (such as 
being non-negative) and has time averages that converge to a constant with probability 1 , then this constant 
must be less than or equal to the liminf time average expectation [15]. Thus, the inequalities (HI)-© imply 
CD)-© when the y m (t) and \Qk(t)\ processes are deterministically lower bounded and have convergent 



lim sup - V] y (r 

l ^ 

lim sup- ^2 y m (r 

T=0 

j t-1 K 

limsup - |Q fc (r) 



t— >oo 



T = k = l 
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time averages with probability 1. Systems that evolve on positive recurrent irreducible Markov chains 
with finite or countably infinite state space can often be shown to have convergent time average penalties. 
Further, if the Markov chain is irreducible and has a finite or countably infinite state space with the property 
that the event {J2k=i \Qk\ > @} corresponds to only a finite number of states for each real number 9, 
then the condition © implies positive recurrence. However, in addition to the actual network queues, the 
drift-plus-penalty method introduces virtual queues to enforce the desired time average constraints. These 
queues typically give the overall system an uncountably infinite state space. Time average convergence 
can be shown using generalized Harris recurrence theory for Markov chains with uncountably infinite 
state space, provided that certain generalized irreducibility assumptions and petite set assumptions are 
satisfied [16]. However, it is often difficult to check if these assumptions hold for the particular systems 
of interest. 

Strong stability of a queue Q(t), together with either deterministically bounded arrival or server rate 
processes, implies rate stability [17]. Rate stability of Q(t) means that lim^oo Q(t)/t = with probability 
1. This result can be used to prove that © holds if the y m (t) processes are suitably deterministically 
bounded on each slot t. However, this does not ensure the constraints © or © hold. 

Certain types of systems, such as networks with flow control, often have a structure that yields 
deterministically bounded queues [4] [18], which can be used to ensure constraints ©-© hold for those 
systems. However, this requires special structure, and it also does not ensure © holds unless suitable 
Markov chain assumptions are met. 

B. Alternative algorithms 

A dual-based algorithm related to the drift-plus -penalty method is considered for a wireless downlink 
with "infinite backlog" in [7], and convergence to near-optimal utility is shown using a countable state 
space Markov chain assumption. Stochastic approximation algorithms are used in [19], and diminishing 
stepsize convex programming is used in [20] to treat problems that are more deterministic in structure. 
The works [7][19][20] do not show the [0(1/V),0(V)] performance-backlog tradeoff. 

Primal-dual algorithms are considered for scheduling in wireless systems with "infinite backlog" in 
[21] [22] and shown to converge to a utility optimal operating point, although this work does not consider 
queueing or time average constraints. A related primal-dual algorithm is treated in [6] for systems with 
queues. A fluid version of the system is shown to have a utility-optimal trajectory, and it is conjectured that 
the actual system has a near-optimal utility. Recent work in [13] considers fluid analysis of primal-dual 
updates and proves near-optimal utility of the actual system with probability 1 . It also treats a more general 
class of objective functions that have time varying parameters. However, it considers only rate stability 
for queues and does not specify the [0(1/V),0(V)} tradeoff. Work in [23] considers stochastic queues 
with a non-convex objective function, and shows that if the throughput vector converges, it converges to 
a near-local optimum or a critical point with a [0(1/V),0(V)] utility-delay tradeoff (where a near-local 
optimum is a near-global optimum in the special case of convex problems). 

C. Paper Outline 

In the next section we review the basic drift-plus -penalty theorem and discuss the performance bounds 
it provides, which are in terms of time average expectations. We then state the main theorem of this paper, 
which shows the same bounds hold as pure time averages with probability 1. A key special case of this 
theorem is that if a certain quadratic Lyapunov drift condition is satisfied, then the network queues satisfy 
all of the six major forms of queue stability. Section [III] provides background on the Kolmogorov law 
of large numbers needed in our analysis, and derives a simple but useful generalized drift-plus -penalty 
theorem. Section [XV] shows that the conditions required for the generalized drift-plus -penalty theorem to 
hold are satisfied under quadratic Lyapunov functions if certain boundedness properties hold. Section IVl 
uses this result in queueing networks to derive bounds of the form ©-© for those systems. 
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II. The Drift-Plus-Penalty Theorem 

Let Q(t) = (Qi(t), Q2(t),. . ., Qk{^)) be a stochastic vector with real- valued components, and let pit) be 
a real- valued stochastic process on the same probability space as Q(t). These processes evolve in discrete 
time with unit time slots t E {0, 1,2,.. .}. The vector Q[t) can represent queue backlogs in a network of 
K queues. The process p(t) can represent a penalty process, where p(t) is a real-valued penalty (such as 
power expenditure) incurred by some control action taken by the system on slot t. While typical queue 
backlogs and penalties are non-negative, for generality we allow them to possibly take negative values. 

For each slot t, define Hit) as the history of past Q(r) and p(r) values, where Q[r) values are taken 
up to and including slot t, and p(r) values are taken up to but not including slot t. Specifically, define 
H(0)4{Q(0)}, and for each t > define: 

H(t)A{Q(0), Q(l), . . . , Q(t),p(0),p(l), ■ • ■ ,p(t ~ !)} (10) 
As a scalar measure of the size of the Q{t) vector, define the following quadratic Lyapunov function: 

1 - 

L(Q(t))A-Y,w k Q k (t) 2 (11) 

fc=i 

where the constants w k are positive weights. Define A(T-L(t)) as the conditional Lyapunov drift: 

A(H(t))AE {L(Q(t + 1)) - L(Q(t))\H{t)} (12) 

Note that %{t) includes Q(t), and so the above conditional expectation is with respect to the conditional 
distribution of Q(t + 1) given Q(0), . . . , Q(t),p(0), ...,p(t- 1). 

The drift-plus-penalty algorithm for minimizing the time average expected penalty p(t) subject to queue 
stability operates as follows: Every slot t the network controller observes the current H(t) and chooses a 
control policy that minimizes a bound on the following expression^ 

A(H(t))+VE{p(t)\n(t)} (13) 

where V is a non-negative control parameter that is chosen to affect a desired tradeoff between the average 
penalty and the average queue backlog. A version of this algorithm was developed in [2] for maximizing 
throughput-utility subject to stability, and simple modifications were presented for other contexts in [1][4]. 
This is a useful algorithm for queueing networks because it can typically be implemented based only on 
Q(t), without keeping a memory of the full history and without requiring knowledge of traffic rates or 
channel probabilities (see applications in Section [V). Such a control policy often gives rise to stochastic 
processes Q(t) and pit) that satisfy the following drift-plus-penalty condition for all slots t E {0, 1,2,.. .} 
and all possible H(t): 

K 

A(H(t)) + VE {p(t)\H(t)} <B + Vp*-eY, \Qk(t)\ (14) 

k=l 

where B, p*, e are finite constants, with e > 0. The value p* represents a target value for the time average 
expectation of the penalty process p(t). The following theorem from [1][2][4] shows that this condition 
immediately implies the time average expectation of p(t) is either above the target p*, or is within a 
distance of at most 0(1/ V") from p* , while ensuring time average expected queue backlog is 0(V). 

'Strictly speaking, the prior work in [1] defines the Lyapunov drift by conditioning only on Q(t), rather than on the full history H(t). 
We condition on "H(t) in this paper because such conditioning is needed for application of the Kolmogorov law of large numbers. 
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Theorem 1: (Lyapunov Optimization with Expectations [1][2][4]) Assume that E{L(Q(0))} < oo, and 
that the condition (fl4l) holds for some finite constants B, p*, V > 0, and e > 0. If there is a finite (and 
possibly negative) constant p min such that E {p(t)} > p m in for all slots t E {0, 1,2,.. .}, then: 

1 B 
limsup— VE{p(t)} < p* + - (15) 

M-too M „ V 7 



M-1 P_i_T//'* 

T=0 fe = l 



Further, if (IT4|) holds in the case V = 0, then inequality (Tl6l) still holds. Likewise, if CHI) holds in the 
case e = 0, then inequality (fT5T) still holds. 

The proof of Theorem \T\ requires only three lines and is repeated below to provide intuition: Taking 
expectations of (fT4l) and using the law of iterated expectations yields the following for all t E {0, 1,2,.. .}: 

K 

E {L(Q{t + 1))} - E {L(Q(t))} + VE {p(t)} <B + Vp*-tY, E {\Q k {t)\} 

k=l 

Summing the above over t E {0, . . . , M — 1} for some integer M > and dividing by M yields: 

^w))}-EW 0( ,)) } + ^^ EW)} ^ + ^_^^^ E{|qt(t)|} 

t=0 t=0 k=l 

Rearranging terms in the above inequality and using the fact that E {L(Q(M))} > and E {p(t)} > p m in 
for all t immediately leads to the following two inequalities: 

W_J B E{L(Q(0))} 



i-^EMt)} < P * + ^ + 



t=0 k=l 

Taking a limit of the above inequalities as M — > oo yields (fT5V(fT6l). 



A. Main Result of This Paper 

Theorem [T] illustrates an important tradeoff between time average expected penalty and the resulting 
time average expected queue backlog. However, one may wonder if the same bounds hold with probability 
1 for pure time averages (without the expectations). To address this question, we impose the following 
additional boundedness assumptions: 

• The second moments E {p(t) 2 } are finite for all t E {0, 1, 2, . . .} and satisfy: 

r=l 

• There is a finite (possibly negative) constant p min such that for all slots t and all possible H(t): 

E{p(t)\U(t)}> Pmin (18) 

• There is a finite constant D > such that for all slots t, all possible Q(i), and all k E {1, ... , K} 
the conditional fourth moments of queue changes are bounded as follows: 

E{(Q k (t + l)-Q k (t)y\Q(t)} <D (19) 
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Note that condition (fT71) holds whenever E {p(t) 2 } < C for all t for some finite constant C > 0. The 
following theorem is the main result of this paper: 

Theorem 2: (Lyapunov Optimization with Pure Time Averages) Assume the boundedness assumptions 
(fT71) -([T9l) hold. Let L(Q(t)) be a quadratic Lyapunov function of the form (fTTI) . and assume the initial 
queue backlog Q(0) is finite with probability 1. If the drift-plus-penalty condition (fl4l) is satisfied for all 
slots t and all possible H(t) (with finite constants B, p*, V > 0, e > 0), then: 

1 <_1 B 
limsup - y^ptr) < p* + — (w.p.l) (20) 



T=0 

t-1 A" 



limsu P7EE I^OOI ^ E + y(P * P ^ n) fo-P-1) (2D 



t->oo t 

T=0 fe = l 



where (w.p.l) stand for "with probablity 1." Further, for all k £ {1, . . . , K} we have: 

lim = {w.p.l) (22) 

t— >oo t 

Finally, if flU) holds in the case V = 0, then inequality (EB and equality ((221) still hold. 

A more detailed upper bound on time average queue backlog is provided in (l52l) of the proof. 



B. Queue Stability 

A special case of Theorem [2] is when the fourth moment condition (fT9l) is satisfied and when the 
following drift condition holds for all t and all %{t): 



K 



A(H(t)) < B - eJ2\Qk(t)\ 



k=l 



where B > and e > 0. This is a special case of (PT4l) with V = and p(£) = p* 
have that all queues Qk{t) in the system satisfy: 



(23) 



0. In this case we 



1 M 

limsup — VE{|Q fc (t)|} < B/e 

M-+oo M 

1 M 

limsup — J2\Q k (t)\ < B/e 

M^oo M ^ 



lim 

q—too 



lim 

q— >oo 



M-l 



(w.p.l) 



limsup-^- V Pr[|Q fc (*)| > q] 



t=0 
RI-l 



limsup-^- V l{\Q k {t)\ > q] 



(w.p.l) 



\im E{\Q k (t)\}/t 
lim Q k (t)/t 



t—>oo 








t— >oo 



[w.p.l) 



(24) 
(25) 
(26) 

(27) 

(28) 
(29) 



where > q] is an indicator function that is 1 if \Qt(t)\ > q, and else. The above are 6 major 

forms of queue stability. The inequality (l24j) is often called strong stability, and holds by Theorem [Q Its 
sample path version is inequality (1231 ), and this holds by Theorem [21 The inequality (1241) can easily be 
used to prove (|26|) via the fact that \Qk(t)\ > <?l{Qfc(£) > q], and the same fact can easily prove that (1231) 
implies (1271) . The stability definition (1281) is called mean rate stability, and does not follow from any of 
the above results, but follows from Theorem [3] given below. The stability definition (|29l ) is a sample path 
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version called rate stability, and is implied by Theorem [2] Relationships between these various stability 
definitions are discussed in [17]. In summary, if changes in queue backlogs have uniformly bounded 
conditional fourth moments (so that (IT9T ) holds), and if the Lyapunov drift condition (1231 holds for a 
quadratic Lyapunov function, then all queues in the network satisfy all of the major forms of stability. 

The following useful theorem shows that in the special case e = 0, the condition (1231) still implies rate 
stability and mean rate stability, regardless of whether or not conditional fourth moments are bounded. 

Theorem 3: (Rate Stability and Mean Rate Stability) Let L(Q(t)) be a quadratic Lyapunov function of 
the form ([TTb . Suppose there is a finite constant B > such that for all r G {0, 1, 2, . . .} and all possible 
%{t), we haveH 

A(H(t)) < B 

Then: 

(a) If E {L(Q(0))} < oo, then Q k (t) is mean rate stable for all k G {1, ... , K}. That is: 

limE{|Q fc (t)|}/t = 

t— >oo 

(b) If Q(0) is finite with probability 1, and if there is a finite constant D > such that for all 
t G {0, 1,2,...} and all k G {1, ... , if} we have: 

E {(<2jfc(* + 1) - Qk{t)) 2 } < D 

then Qfc(t) is rate stable for all k G {1, ... , if}. That is: 

lim Q fc (t)/t = (tu.p.l) 

t— >oo 

Proof: See Appendix E. □ 
Theorem [3] only requires the (unconditional) second moment of queue changes to be bounded, whereas 
Theorem [2] requires (conditional) fourth moments to be bounded. 

III. Convergence of Time Averages 

This section reviews basic convergence definitions and results needed in the proof of Theorem |2] It 
then develops a generalized drift-plus-penalty result for processes with a certain variance property. 

A. Discussion of Convergence With Probability 1 

Let Y(t) be a real-valued stochastic process defined on t G {0, 1,2, . . .}. To say that Y(t) converges 
to a constant aGB "with probability 1" (or "almost surely"), we use the notation: 

lim Y(t) = a (w.p.l) (30) 

t— >oo 

It is well known that (1301) holds if and only if for all e > we have: 

lim Pr[U t > n {\Y(t) - a\ > e}] = (31) 

n— >oo 

Probabilities of the type (T3TI ) can be bounded via the union bound: 



< Pr[U t > n {\Y(t) - a\ > e}} < ^Pr[|F(i) - a\ > e] (32) 

t=n 

It follows that (|3Tb holds if the infinite sum on the right-hand-side of (1321 is the tail of a convergent 
series. Bounds on each term of the series can be obtained via the well known Chebyshev inequality: 

E{(Y(t)- a y} 



Pr[\Y(t) -a\ >e}< 



€ 2 



2 The same results for Theorem [5] hold if the requirement "A(H(t)) < B" (which conditions on the full history H(t)), is replaced with 
"E{L(Q(t + 1)) - L(Q(t))\Q(t)} < B" (which conditions only on Q{t)). 
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The above discussion explains the following well known lemma: 
Lemma 1: If Y(t) satisfies the following: 

oo 

^E{(F(t)-a) 2 }<oo 
t=i 

then (l30l) holds, that is, the variables Y(t) converge to a with probability 1. 

Corollary 1: (Rate Stability in Queues with Finite Variance) If Q(t) is a real-valued stochastic process 
defined over slots t E {0, 1,2,.. .} that satisfies: 

t=i 

then: 

lim = (w.p.l) 

t->oo t K 

In particular, this holds whenever there is a finite constant C > such that E {Q(t) 2 } < C for all t. 

Proof: This corollary follows as an immediate consequence of Lemma [T] by defining Y(t)=Q(t) 2 jt 2 
and a = 0. The special case when E {Q(t) 2 } < C follows because J2Zi % < 00 ■ D 

B. Time Averages and the Kolmogorov Strong Law for Martingale Differences 

Let X(t) be a real-valued stochastic process defined over timeslots t E {0, 1, 2, . . .}. Define the history 
%x{t) to be the set of values of the process before slot t, so that 7ix(0) is the empty set, and for all 
slots t > we have: 

Hx(t)A{X(0),X(l), ...,X(t-l)} (33) 

We first assume the process X(t) has the property E{X(t)\'Hx(i)} = for all t and all possible T-L x {t)- 
Such processes are called martingale differences. The following theorem is a well known variation on the 
Kolmogorov strong law of large numbers. 

Theorem 4: (Kolmogorov strong law for martingale differences [15] [24] [25]) Suppose that X{t) is a 
stochastic process over t E {0, 1, 2, . . .} such that: 

. E{X{t)\H x (t)} = for all t and all H x (t), where H x {t) is defined in ([33]>. 

• The second moments E{X(t) 2 } are finite for all t and satisfy: 

E H <™ (34) 

t=l 1 

Then: 

1 ^ 

lim - > X(r) = (w.p.l) 

t— >oo t — ' 

T=0 

The following corollary follows easily from the Kolmogorov strong law given above. 

Corollary 2: Let X(t) be a stochastic process defined over slots t E {0, 1,2,.. .}, and suppose that: 

• There is a finite constant B such that E {X[t)\Hx(t)} < B for all t and all Hxit), where the history 
H x {t) is defined in (|33|>. 

• The second moments E {X(t) 2 } are finite for all t and satisfy: 

t=i 

Then: 

t-i 



limsup - ^^X(t) < B (w.p.l) 



t—>oo t 

T=0 



Proof: The idea is to define the process X(t)=X(t) — ¥,{X(t)\'Hx(t)}, and then apply the result of 
Theorem |4] to the process X(t). This is shown in Appendix A for completeness. □ 
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C. A Generalized Drift-Plus-Penalty Theorem 

Now let ^(t) be a non-negative stochastic process defined over slots t E {0, 1,2, . . .}, and let (3(t) 
be another stochastic process defined on the same probability space and whose time average we want to 
show is non-negative. The ^(t) process can represent the values of a general Lyapunov function over 
time t E {0, 1,2,.. .}. Define 5(t)=^(t + 1) - as the difference process. Define the history H{t) for 
this system by: 

^)A{*( ),...,*(t),/3(0),...,/3(t-l)} (35) 

Theorem 5: (Generalized Drift-Plus-Penalty) Suppose ^(O) is finite with probability 1, that E{5(t) 2 } 
and E{(3(t) 2 } are finite for all t, and that: 



E 



E{5(t) 2 } + E{P{t) 2 } 



t 2 
t=i 



< oo 



Further suppose that the following drift-plus -penalty condition holds for all t and all possible "H(t): 

E{5(t) + P(t)\H(t)} < (36) 

Then: 

1 

limsup - 0(t) < (u>.p.l) 

Proof: Define X(t)=5(t) + Pit). The idea is to apply Corollary [2] to the process X[t). To this end, 
we simply need to show that X[t) satisfies the assumptions needed in Corollary [2] Note that the history 
"H(t) contains more information that the history Hx(t), defined: 

U x (t)A{X(0),X(l),...,X(t-l)} 

Indeed, T-Lx(t) can be ascertained with knowledge of the more detailed history "H(t). Thus, we can write 
V.(t) = 1-L{t) L)7ix(t), as adding the information W,x(t) does not create any new information. Thus, using 
iterated expectations yields: 

E{E{X{t)\H{t)}\H x {t)} = E{E{X(t)\H(t)UH x (t)}\H x (t)} 

= E{x(t)\n x (t)} 

On the other hand, by (l36l) we have: 

E{E{X(t)\H(t)}\H x (t)} = E{R{5(t)+0(t)\U(t)}\Hx(t)} 

< E{o\n x (t)} = o 

Therefore, for all t and all possible 7ix(t) we have: 

E{X(t)\U x (t)}<0 

It remains only to show that: 

E{X(tf] 



E 



t 2 
t=i 



< oo 



Because (5(t) + (5{t)) 2 < 25(t) 2 + 2/3(t) 2 , we have: 



t=l t=l 

n ^ E{5(t) 2 + (3(t) 2 } rnp 

^ 2 zL 72 <0 ° 

t=i 1 
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Thus, by Corollary [2] we have: 



1 <_1 

limsup-^X(r) < (w.p.l) (37) 



t— >oo t 

r=U 

However, recalling that X(t)=^(t + 1) - V(t) + Pit), we have: 

t-i t-i 
5>(r) = *(i)-*(0) + £/3(r) 

T=0 T=0 

t-1 

> _ 1 Ir(0) + ^ /3 ( T ) 

r=0 

where the final inequality holds because ^(i) > 0. Dividing the above inequality by t yields: 



T=0 T = 

Taking a lim sup of the above as t — > oo and using (1371) proves the result. □ 



IV. The Lyapunov Optimization Theorem — Proving Theorem [2] 

Consider the stochastic processes Q(t) = (Qi(t), . . . ,Q K (t)) and p(t) as described in Section HO 
Consider the quadratic Lyapunov function L(Q(t)) defined in (fTTI) . repeated again here for convenience: 

1 - 

L(Q(t)) = -J2^kQk(t) 2 



2 



where Wk > for all k. Define ||Q(£)|| by: 



It is not difficult to show that: 



Q(t) 1 1 4 v/I(QW) = V I E£=i u*g fc W 2 

Ef=i^W*)l>IIWII (38) 



Further, for any vectors a,bwe have: 

||o + 6|| < ||a|| + H&H (39) 

Define the drift A(7i(t)) according to (fT2l) . where the history H(t) is defined in (flOl) . Define the 
Lyapunov difference process 5(t)=L(Q(t + 1)) — L(Q(t)), and note by definition that: 

E{8(t)\H(t)} = A(H(t)) (40) 

Define dk(t) as the queue difference process: 

d k (t)±Q k (t + l)-Q k (t) 

We will bound the time averages of p(t) and Qk(t) when the following drift-plus -penalty condition holds 
for all t and all H(t): 

K 

A(H(t)) + VE {p(t)\H(t)} <B + Vp*-eJ2 \Qk(t)\ (41) 

fc=i 
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for some finite constants B, p*, V, e. To this end, we define ^(t)=L(Q(t)) and /3(t) as follows: 



K 



f3(t)AVp(t) -B-Vp* + e ^2\Q k (t)\ (42) 

k=l 

The idea is to show that the assumptions needed in the generalized drift-plus-penalty theorem (Theorem 
|5]) hold for these definitions of ty(t) and f3(t). 

Theorem 6: Suppose that the boundedness assumptions (fPTT) and (fT9l ) hold. Suppose that K{Q k (t) 2 } 
is finite for all k and all t, and that for all k G {1, ... , if} we have: 

f: E wf ><«, («) 

*=i * 

Define the quadratic Lyapunov function L(Q(t)) as in (fTTI) . and suppose there are constants -B, p*, > 0, 
e > for which the drift-plus-penalty condition (I4TT) holds for all t and all possible Then: 

a) If V > we have: 

1 £ 
limsup- y p(r) < p* + — (w.p.l) (44) 

b) If e > 0, we have: 

1 t-i k B V 1 

limsup -^^|Q fc (r)| < — + — limsup - - p(r)] (w.p.l) (45) 

iVoo/: Define V(t)AL(Q(t)), 5(t)±L(Q(t + 1)) - L(Q(t)) and define /3(t) as in ©. For all t and 
all T-L{t) we have: 

E{5(t) + /3(t)|ft(t)} = A{H(t))+E{P(t)\H(t)} (46) 

= A(^(*))+VE{p(t)|^(t)}- J B-Vp* + e5]|Q fc (*)| (47) 

fe=i 

< (48) 

where (@5) follows from ©, (@7& follows by definition of (3(t) and the fact that E {\Q k (t)\\H(t)} = 
\Qk(t)\, and (Hi) follows from (SB. 
Claim 1: 



t 2 



This claim is proven in Appendix B. Assuming the result of the claim, we know that all conditions for 
the and (3(t) processes needed to apply Theorem [5] hold. We thus conclude: 



1 

limsup - f3(r) < (w.p.l) 



r=0 

That is: 



1 * _1 
lim sup - 2, 



< (w.p.l) (49) 



K 

Vp(r)-B-Vp* + eJ2\Qk(r) 

k=i 

First assume that V > 0. Neglecting the non-negative term e5^ fe=1 |<5a;(t)| from ( |49l and dividing by V 
yields: 



r=0 



1 

limsup - ^J[p(t) — B/V — p*] < (w.p.l) 



t— >oo t 

r=0 



12 



This proves (|44l) . 

Now note that for any functions f(t), g(t), we haveH 

limsup[/(t) — g(t)] < =>- limsup f(t) < limsupg(t) 

t— >oo t— >oo t— >oo 

Defining /(t)Af £^ ^f =1 |Q fc ( r )| and g{t)±\ E^fS + V(p* - p(r))}, it follows from m that: 

t-l K ^ i-1 

limsup - 2^2J l < 5 fc ( r )l - limsu P T + ~ -P( r ))] ( W -P-1) 

t— >00 « nil t— »oo « „ 

If e > 0, we can divide the above by e to prove (1451) . □ 
Theorem 7: Suppose we have a quadratic Lyapunov function L(Q[t)) as defined in (fTTT) . and that 
assumption (fT9l ) holds, so that E {dk(t) A \Q[t)} < D for all t and for some finite constant D, where 
dk{t) = Qk{t + 1) - Qk{t)- Suppose that E{||(3(0)|| 4 } < oo. Suppose that there is an e > and a 
constant B > such that: 

K 

ACH(t)) < B - eJ2\Qk(t)\ (50) 

k=l 

Then: 

a) There are constants c > and a > such that whenever ||Q(£)|| > a, we have: 

E{||Q(t + l)|||Q(f)}<||Q(t)||-c 

b) There is a finite constant b > such that for all M G {1, 2, . . .} we have: 

A/-1 

M 



1 M-l 



t=o 



c) For all k G {1, . . . , if} we have: 



d) For all A; G {1, ... , if} we have: 



E 



EWt(i) 2 } „ 

- < OO 



e) We have: 



t 2 
t=i 



y Qkif) 

hm = U (w.p.l) 

t— >oo t 



1 K B 
limsup- Vy^lQit (r) | < — (w.p.l) 

T=0 fe=l 



Proof: The proof of parts (a) and (b) closely follow a similar result derived for exponential Lyapunov 
functions with deterministically bounded queue changes in [26], and are provided in Appendix C. To 
prove parts (c), (d), (e), we have from part (b) that for all M G {1, 2, 3, . . .}: 



M-l 



J2^{\\Q(t)\\ 3 } <bM (51) 



t=o 



However, we have ||Q(t)|| 3 > ||Q(£)|| 2 - 1. Using this with (ED gives: 



M-l 



£(E{||Q(t)|| 2 }-l)<&M 



t=o 

3 This follows by: limsup^^ /(*) = limsup t _ ) . 00 [g(*) + (/(<)- ff(*))] < iinisup^^ 5 (i) + limsup^^ (/(f) - £?(*)). 
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and so: 

M-l 

^E{||Q(t)|| 2 }<(6+l)M 

t=o 

Using ^Qfc(t) 2 < ||Q(t)|| 2 in the above inequality proves that there is a finite constant C > such that 
for all k G {1, ... , K} we have: 

Af-l 

E (Qfe(t) 2 } < CM VM G {1, 2, 3, . . .} 

t=o 

Lemma |4] in Appendix D shows that the above inequality implies the result of part (c). 

Part (d) follows immediately from the result of part (c) together with Corollary [TJ To prove part (e), 
we note that the result of part (c) implies that the conditions for Theorem [6] are met for the case e > 0, 
pit) = p* = V = 0, B = B, which yields the result. □ 



A. Completing the proof of Theorem [2] 

Suppose now the assumptions of Theorem [2] hold, so that the drift-plus -penalty condition (fl4)) is satisfied 
for all t and all "%(£), and the boundedness assumptions (fT7l)-(fT9l) hold. We temporarily also assume that the 
initial state Q(0) is deterministically given as some constant vector (so that E{||Q(0)|| 4 } = ||Q(0)|| 4 < 
oo). The condition ([14)) together with the fact that E {p{t)\'H{t)} > p m in implies: 

K 

A(7*(f)) <B + V(p* - p mm ) - e \Qk(t) I 



k=l 



Defining B = B+V(p*—p m i n ), by Theorem[7]we know all queues are rate stable, that is, lim^oo Qk(t)/t 
with probability 1 . We also know by Theorem [7] that: 



E 



E{Q fc (t) 2 } 

f 2 <0 ° 



t=l 

Then all assumptions are satisfied to apply Theorem [6) and so we have that: 

t-i 



lim sup - p{r) < p* + — (w.p.l) 

t^oo t V 

1 t-i if B V 1 

limsup-^^|(5 fc (r)| < 1 limsup - ^[p* - p(r)] (w.p.l) (52) 



t 7— f e e t_>oo t 

r=U fe=l T=0 



Because E{-p(t)|W(t)} < -p mi „ for all t and all H{t), and E^i E {pW 2 } A 2 < °°> we know b Y 
Corollary El that: 



i— >oo t 

T=0 



1 

limsup - y ]\p* - p{r)} < p* - v, u ,„ 



This together with (|52|) proves (|2~n) . Thus, all desired performance bounds hold with probability 1 under 
the assumption that the initial queue vector is some finite value Q(0). Because these bounds do not 
depend on Q(0), it follows that these same bounds hold (with probability 1) if Q(0) is chosen randomly, 
provided that Q(0) is finite with probability 1. 
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B. Variations on Theorem [2] 

Suppose there are processes B(t), pit), P*(t), Q(t) and constants V > 0, e > such that for all t and 
all possible 'Hit), we have: 

A 

&{U(t)) + VE{p(t)\U(t)} <E{B(t)\H(t)} + VE{p*(t)\U(t)} - \Q k (t)\ (53) 

k=l 

This is a variation on the drift-plus -penalty condition (PT41) that uses a time- varying p*(t) and B{t). Suppose 
that Q(0) is finite with probability 1, and that: 

• Second moments of p{t), B(t), and p*(t) are finite for all t, and: 

j^ E{[v(p(t)- P *(t))-B(m _ c 

t=l t2 

• There is a constant /3 mi „ such that for all t and all "H(t): 

E{^(p(*) -P*(*)) -B(t)\U(t)} > f3 mm 

• There is a constant .D > such that for all k £ {1, ... , fT}, all t, and all possible Q(t): 

E{{Q k {t + l)-Q k {t)Y\Q{t)} <D 
Then we can define £?4(D, F = 1, p(t)AV(p(t) - p*(t)) - B(t), f3* = to find: 

A" 



A(tt(t)) + E {/3(t)|ft(t)} < - e ^ |Q fc (t)| 



Thus: 



fc=i 

Then the conditions of Theorem [2] hold for and f3*, and so we conclude (using (l52l)): 

1 

limsup - 2. /3(t) < (w.p.l) 

r=0 

. t-1 JT - - i-1 

limsup - J^J^ |Q fe (r) | < -limsup - ^[-/3(r)] (w.p.l) 
J r=0 k =i e J r=0 

1 t-l x x t-i 

limsup - 2^ [p(r) — p*(r)] < — limsup - V] -B(t) (w.p.l) 

r=U r=0 
j t-1 A t-1 

limsup -^^IQfe(r) | < - limsup ~Y}B{t) + V{p*{r) - p{r))} (w.p.l) 

t— >00 ''nil ^ t— >00 I „ 

r=0 fe=l t=0 

V. Applications 

Here we illustrate an important application of Theorem [2] to optimization of time averages in stochastic 
queueing networks. This is the same scenario treated in [1]. However, while the work in [1] obtains 
bounds on the time average expectations via Theorem Q3 here we obtain bounds on the pure time averages 
via Theorem |2] 

Consider a K queue network with queue vector Q(t) = (Qi(t), . . . , Quit)) that evolves in slotted time 
t G {0, 1,2,.. .} with update equation: 

Q k (t + 1) = max[Q fc (t) - b k {t) + a k (t), 0] V* e {1, . . . , K} (54) 
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where a k (t) and b k (t) are arrival and service variables, respectively, for queue k. These are determined 
on slot t by general functions a k (a(t),u(t)), b k (a(t),u(t)) of a network state ui(t) and a control action 
a(t): 

a k (t) = a k (a(t),u)(t)) , b k (t) = b k (a(t),u(t)) 

where the control action a(t) is make every slot t with knowledge of the current tu(t) and is chosen 
within some abstract set Aut\- The u(t) value can represent random arrival and channel state information 
on slot t, and a(t) can represent a resource allocation decision. For simplicity, assume the u(t) process 
is i.i.d. over slots. 

The control action additionally incurs a vector of penalties y(t) = (yo(t), Vi(t), . . . ,y M (t)), again given 
by general functions of a(t) and tu(t): 

y m (t) = y m (a(t),u(t)) 
For t > 0, define a k (t), b k (t), y m (t), Q k (t) as time averages over the first t slots: 



1 r=0 



t r 

The goal is to choose control actions a(t) E Au(t) over time to solve the following stochastic network 
optimization problem: 

Minimize: limsupy (t) (55) 

t— >oo 

Subject to: 1) limsup Q k (t) < oo Vfc G {1, . . . , K} (56) 

t— >oo 

2) limsupy m (t) < Vm G {1, . . . , M} (57) 

t— >oo 

3) a(t)eA,(t) Vte {0,1,2,...} (58) 

Typical penalties can represent power expenditures. For example, suppose y m (t)=p m (t) — p^, where 
p m (i) is the power incurred in component m of the network on slot t, and p™ is a required time average 
power expenditure. Then ensuring limsup^^y^t) < ensures that limsup t _ >00 p m (t) < p^, so that 
the desired time average power constraint is met [4]. 

To ensure the time average penalty constraints are met, for each m G {1, . . . , M} we define a virtual 
queue Z m (t) as follows: 

Z m (t + 1) = max[Z m (t) + y m (£), 0] (59) 



It is easy to see that for any t > we have: 



t-i 



Z m (t) - Z m (0) > ^2y m (r) 



r=0 



and therefore, dividing by t and rearranging terms yields: 



t-i 



1 \ ^ / \ , Z m (t) Z m (0) 

- t l^ym(T)<— — 

T = 

It follows that if Z m {t) is rate stable for all m, so that Z m (t)/t — > with probability 1, then the constraint 
(|57l) is satisfied with probability 1. 



16 



Now define ®(t) = [Q(t), Z(t)) as the combined queue vector, and define the Lyapunov function: 

~ K M 



k=l m=l 



The system history H(t) is defined: 

K(t)4{e(0), 0(1), . . . , 0(t),y o (O),yo(l), • • • , 2/o(* - 1)} 
The drift-plus-penalty algorithm thus seeks to minimize a bound on: 

A(H(t)) + VE{y Q {a(t),u(t))\H(t)} 

A. Computing the Drift-Plus-Penalty Inequality 

Assume the functions d k (-), b k (-), 2/o(") satisfy the following for all possible w(i) and all possible 
a(t) E An®' 

0<a k (a(t),u(t)) , 0<b k (a(t),u(t)) > > y mm 

where y™ m is a deterministic lower bound on yo(t) for all t. Also assume that there is a finite constant 
D > such that for all (possibly randomized) choices of a(t) in reaction to the i.i.d. cu(t) we have: 

E{a k (a(t),uj(t)) 4 } < DVke{l,...,K} (60) 

E{b k (a(t),u(t)) A } < D V/c 6 {1, . . . , K} (61) 

E{y m («(t),u;(t)) 4 } < D Mm 6 {1, ... , M} (62) 
E{y (a(t),u(t)) 2 } < D (63) 

where the expectations are taken with respect to the distribution of the i.i.d. to(t) process, and the possibly 
randomized decisions a(t) E AuQ). 

By squaring (|54l) and (|59l ) it is not difficult to show that the drift-plus-penalty expression satisfies the 
following bound (see [1]): 

A(H(t))+VE{y {a(t),u(t))\H(t)} < B + VE{y (a(t),u(t))\H(t)} 

K 

+ Qdt)E [a k (a(t), oj(t)) - b k (a{t),u{t))\H(t) 

k=l 

M 

+ Z -W E {y m Ht)^{t))\u{t)} (64) 

m=l 

for some finite constant B > 0, representing a sum on the second moment bounds of the a k (t), b k (t), and 
y m (t) processes. 

B. The Dynamic Drift-Plus-Penalty Algorithm 

It is easy to show that the right-hand-side of the inequality (l64l) is minimized by the policy that, every 
slot t, observes only the current queue values Q(t), Z{t) and the current u(t) and chooses at(t) E A u {t) 
to minimize the following expression: 

K M 

Vy Q (a(t),u(t)) +J2Qdt)[d k (a(t),u(t)) - b k {a{t),u{t))] + ^ Z m (t)y m (a(t),u(t)) 

k=l m=l 

Then update the actual queues Q k (t) according to (|54l ) and the virtual queues Z m {t) according to (|59l ). 
This policy does not require knowledge of the probability distribution for u(t). One difficulty is that 
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< C + inf 



it may not be possible to achieve the infimum of the above expression over the set AwU), because we 
are using general (possibly non-continuous) functions a k (a(t),ui(t)), b k (a(t),u(t)), y m (a(t),u(t)) and a 
general (possibly non-compact) set A^it). Thus, we simply assume there is a finite constant C > such 
that our algorithm chooses a(t) E Aut) to come within an additive constant C of the infimum on every 
slot t, so that: 

K M 

Vy (a(t),u(t)) + Qk{t)[a k {a{t),u{t)) - b k (a(t),u(t))} + ^ Z m (t)y m (a(t),u(t)) 

k=l m=l 
K M 

Vy {a,u(t)) + y^ j Q k (t)[a k (a,u(t)) - b k (a,u(t))) + ^ Z m (t)y m (a,u(t)) 

k=l m=l 

Such a choice of a(t) is called a C -additive approximation. The case C = corresponds to achieving 
the exact infimum every slot. 

C. uj-only policies 

Define a w-only policy to be one that chooses a(t) £ A u u) every slot t according to a stationary and 
randomized decision based only on the observed tu(t) (in particular, being independent of "H(t)). Assume 
there exists an e > and a particular w-only policy a*(t) that yields the following: 

E{a k (a*(t)Mt))-h(a*(t),u(t))} < ~ e VA; G {1, . . . , K} (65) 
E{y m (a*(t),oj(t))} < -e Vme{l,...,M} (66) 

Under this assumption, it can be shown that the algorithm that uses the w-only decisions a*(t) every slot t 
satisfies the constraints (|56l)-(l58l), and hence the problem (|55l)-(l57l) is feasible (meaning that its constraints 
are possible to satisfy). Further, this assumption (similar to a Slater assumption in convex optimization 
theory [27]) is only slightly stronger than what is required for feasibility. Indeed, it can be shown that if 
the problem (I55l)-(l57l) is feasible, then for all S > there must be an w-only algorithm that satisfies [28]: 

E{a k (a*(t)Mt))-h(a?(t)Mt))} < 5 Vk e {1,...,K} 

E{y m (a*(t),io(t))} < 8 Vme{l,...,M} 

Define e max as the supremum of all e values for which an w-only policy exists and satisfies (|65l)-(l66l). 
For < e < e max , define ?/o P '( e ) as tne infimum value of y such that for all 5 > 0, there exists an cu-only 
policy a*(t) that satisfies the following constraints: 

E{y {a*{t),uj{t))} < y + 5 
E{y m {a*{t),u{t))} < -e + 6 Vme{l,...,M} 

It is not difficult to show that: 

• These constraints are feasible whenever < e < e max . 

• The function yo(e) is finite, continuous, and non-decreasing on the interval < e < e max . 

• The set of all such y values that satisfy the above constraints is closed. 

Thus, whenever < e < e max , for any 5 > there exists an w-only algorithm a*(t) such that: 

E{y (a*(t),u(t))} < y«*(e)+6 (67) 

E{a k (a*(t),u(t))-k(a*(t)Mt))} < -e + 6 Vk e {1,...,K} (68) 
E{y m {a*{t),uj{t))} < -e + 5Vme{l,..,M} (69) 

It can be shown that 2/o P '(0) i s me infimum time average penalty for yo(t) over all algorithms that meet 
the constraints (|56|) -(|5B |) (not just w-only algorithms) [4] [28]. Thus, we define y^ pt =y^ pt (0) . 



18 



D. Performance Bounds 

Because our policy a(t) comes within C > of minimizing the right-hand- side of (l64l) every slot t 
(given the observed K(t)), we have for all t and all possible T-i(t): 

A(H(t))+VE{y (a(t),u(t))\H(t)} < B + C + VE{y (a*(t),u(t))\H(t)} 



A 



+ J2Qk(t)^{a k (a*(t),u(t))-k(a*(t),uj(t))\n(t) 

k=l 
M 



m=l 

where a*(t) is any other decision that can be implemented on slot t. Now fix e in the interval < e < e max . 
Fix any 5 > 0. Using the policy a*(t) designed to achieve (|67T)-(|69T) and noting that this policy makes 
decisions independent of Hit) yields: 

A(H(t)) + VE{y (a(t),uj(t))\H(t)}< 

K M 

B + C + V(y^(e) + 8) - (e - S) £ Q k (t) - (e - *) £ z m(t) 

k=l m=l 

The above holds for all 5 > 0. Taking a limit as 5 — > yields: 

K M 

A(H(t)) + VE{y (t)\H(t)} < B + C + Vy° pt (e) - e^Q fc (t) - ej^ Z m {t) (70) 

k=l m=l 

where for simplicity we have substituted yo(t) = yo(a(t),u(t)) on the left-hand- side. Inequality (|7Q|) is 
in the exact form of the drift-plus-penalty condition (fl4l) . Recall that the penalty yo(t) is deterministically 
lower bounded by some finite (possibly negative) value y™ m . Further, the moment bounds (|60l)-(|63l) can 
easily be shown to imply that the boundedness assumptions (fT7T)-(fT9l) hold. Thus, we can apply Theorem 
[2] to conclude that all queues are rate stable (in particular Z m (t)/t — > with probability 1 for all k, so 
that the constraints (1571) are satisfied: 

limsup]/ m (t) < Vm G {1, . . . , M} (w.p.l) 

t—>oo 



Further: 



t-i 

limsup-V^r) < y° Q pt (e) + (B + C)/V (w.p.l) 

t^oo t ' 



T=0 



1 

lim sup - 



T=0 



' K M 

Y,Qk(T) + Y. z ™t> 

,k=l m=l 



< B + C + V(yT(e)-y mm ) 



However, the above two bounds hold for all e such that < e < e max , and hence the two performance 
bounds can be optimized separately over this interval. Taking a limit as e — > in the first bound and 
noting by continuity that \im e ^ Uo Pt ( e ) = 2/o P '(0)=2/o P * yields: 

1 

limsup- X>(r) < y^+(B + C)/V (w.p.l) (71) 

T=U 



Using e = e max in the second bound yields: 



t-i 



lim sup - 

r=0 



M 



,k=l m=l 



B + C + V(yT(t 



< (w.p.l) (72) 
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Thus, this simple dynamic algorithm satisfies the desired time average penalty constraints, stabilizes all 
queues Qk(t), and yields a time average penalty for yo(t) that is within B/V of the optimal value y^. 
The performance gap B/V can be made arbitrarily small by choosing the V parameter large (as shown 
by (TnT)). The tradeoff is a time average queue backlog that is 0(V) (as shown by d72l)). 

By (l52l) . the bound (1721) can be improved, at the expense of sometimes making it less easy to compute, 
by replacing "—y m in" on me right-hand-side with "— liminf^oo ~ ^t=o ^o( r )-" Further, we note that the 
concept of place-holder backlog from [5] is compatible with this analysis and can often be used together 
with the above to provide improved backlog bounds. 

VI. Conclusions 

This work derives an extended drift-plus-penalty theorem for discrete time queueing systems. The 
theorem ensures all queues satisfy all major forms of stability, and that time averages meet desired 
constraints with probability 1. This extends prior results that were known to hold only for time average 
expectations. The boundedness conditions required for the theorem are mild and easily checked. In 
particular, the theorem applies to systems with an uncountably infinite number of possible events, to 
Markov systems with an uncountably infinite state space (possibly neither irreducible nor aperiodic), and 
to non-Markov systems. Our analysis combined the Kolmogorov law of large numbers for martingale 
differences with the drift-plus-penalty method from Lyapunov optimization. The results are applicable to 
a broad class of stochastic queueing networks, and are also useful in other contexts. 

Appendix A — Proof of Corollary |2] 
Suppose the assumptions of Corollary [2] hold, so that E{X(t)\'H x (t)} < B for all t and all H x (t), 



and: 



E 



E{X(t) 2 } 



t 2 
t=i 



< oo 



Define X(t)±X(t)-E{X(t)\U x (t)}. Clearly E {X(t)\H x (t) j = for all t and all H x (t). Now define 
T-Lx(t) as the history of the X(t) process: 

n x (t)A{x(o),...,x(t-i)} 

It is easy to see that conditioning on H x (t) is tne same as conditioning on T-L x (t), because these provide 
the same information. Thus E |A > (t)|'H^(t)| = for all t and all possible % x (t). To apply the result of 
Theorem HI we show that the second moment of X{t) satisfies the condition (1341) . We have for all t: 

E{x(t) 2 } = E{{X(t)-E{X(t)\U x (t)}) 2 } 

= E {X(t) 2 } + E {E {X(t)\n x (t)} 2 } - 2E {X(t)E {X{t)\U x {t)}} 

< E{X(t) 2 } +E{E{X{t) 2 \n x {t)}} -2E{X{t)E{X{t)\n x {t)}} (73) 

= 2E{X(t) 2 } -2E{X(t)E{X(t)\H x (t)}} 



< 2E{X(t) 2 } + 2^E{X(t) 2 }E{E{X(t)\n x (t)} 2 } (74) 

< 2E {X(t) 2 } + 2 y/E {X(t) 2 } E {E {X(t) 2 \H x (t)}} 

< 2E{X(t) 2 } +2^/E{X(t) 2 }y/E{X(t) 2 } = AE{X(t) 2 } 



where (1731) follows by Jensen's inequality, and (1741) follows by the Cauchy-Schwartz inner product 
inequality. It follows that: 

~ Ejxitf^ ™ 4E{X(t ) 2 } 



< OO 

t- 

t=l t=l 
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Thus, the result of Theorem |4] holds for the process X(t), and so: 

t-i 

t— >oo t 



1 t_1 

lim -yXOr) = (w.p.l) 



r=0 

That is: 

t-i 



limlV[X(r)-E{X(r)|H(r)}] = (w.p.l) (75) 



t— >oo t 

T=0 

Using the fact that E {X (t)I'H(t)} < B yields: 

t-i „ t-i 



\ £[Jf(r) - E {X(r)|H(r)}] > 1 £[X(r) - S] 

r=0 r=0 

Taking a lim sup of the above as t -» oo and using (1751) yields: 

1 <_1 

lim sup - ^2[X(r) - B] < (w.p.l) 

This proves the result. 



t— >oo t 

T=0 



Appendix B — Proof of Claim 1 in Theorem \6\ 

Here we prove the Claim 1 needed in Theorem [6] Recall that 5(t)=L(Q(t + 1)) — L(Q(t)), where 
L(Q(t)) is defined in (fTTT) with any weights w k > 0. We prove Claim 1 with two lemmas. 

Lemma 2: Suppose there is a finite constant D > such that for all t and all possible Q(t) we have: 

E{d k (t) A \Q(t)} < D Vke{l,...,K},Vte {0,1,2,...} 

Further suppose that: 

E{Q k (t) 2 } 



t=i 



Then: 



U oo (76) 

E{5m 



E 



Proof: We have: 



t 2 
t=i 



< oo 



K y[Q fc (* + l) 2 -Q,(*) 2 ] = E 
fe=i fc=i 

if 



Thus: 



2 

fe=i 



K K 

, U 

~4 



E {^W 2 } = E E {(2Q fc (f)d»(f) + d*(f) 2 )(2Q<(f)*(*) + rf,(t) 2 )} 



fe=i i=i 
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Further: 

E {(2Q k (t)d k (t) + d k (t) 2 )(2Qi(t)di(t) + di(t) 2 )} = 4E {Q k (t)Qi(t)dk(t)di(t)} + E {d fc (t) 2 di(*) 2 } 

+2E {Qk(t)d k (t)di(t) 2 } + 2E {Q^K^KW 2 } 
< 4 y/ E {ggggg} E {Q t (t)H(tp} 
+ y/ EK(t) 4 }E{^(t) 4 } 
+2y ^{ggggg} E {gg} 
+2 v /E{Q l (t)2rf l (t)2}E{4(t) 4 } 

Because E {c? fc (t) 4 |Q(t)} < Z) for all possible Q(i), we have from iterated expectations that for all 

ke{l,...,K} 

E{d k (t) 4 }<D 

Further, for all k G {1, . . . , K} we have: 

E{Q k (t) 2 d k (t) 2 } = E{E {Q k (t) 2 d k (t) 2 \Q(t)}} 
= E{Q k (t) 2 E{d k (t) 2 \Q(t)}} 

< E{g,(t)VE{4(t) 4 |Q(t)}} 

< E{Q fc (t) 2 /j} 

< DE{Q max (t) 2 } 
where we define Qmax(t) 2 = m.ax k e{i,...,K} Qk(t) 2 - Thus: 



E {(2Q fe (t)4(t) + d k (t) 2 )(2Qi(t)di(t) + d t {t) 2 )} < 4/JE {Q rnax {t) 2 } +D + AD^E{Q max {ty} 

< D 1 E{Q max (t) 2 }+D 2 

for some positive constants Di, D 2 . Thus: 

E{5(t) 2 } < (D^iQ^tfj + D^f^^f 1 

k=l i=l 

< Da + ^^Ejg^t) 2 } 
k=i 

for some positive constants D 5 , D 4 . Thus: 

t=i t=i fc=i t=i 



□ 



Now fix any constants V, -B, p*, e, and recall that f3(t) is defined: 



if 



/3(04Vp(*)-S-Vp* + eX;iQ*(*)l 



k=i 



Lemma 3: Suppose that for all k e {1, ... , if} we have: 



^E{Q fc (t) 2 } 

^ ; J < oo (78) 



t=i 



x . E {pit) 2 } 

Y^^T 1 < °° (79) 
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Then: 

^E{^) 2 } 



< oo 



t 2 

Note that Lemmas [2] and [3] together prove Claim 1 . It remains only to prove Lemma |3] 
Proof: (Lemma |3]) We have: 



K K 



E{/3(t) 2 } = E{(Vp(t)-B-Vp^ 2 } + e 2 Y,Y,®{\Q^\\Q^ 

k=l i=l 

K 

+eJ2®i(Vp(t) - B - Vp*)\Q k (t)\} 



k=l 

K K 

< E {(Vp(t) -B- Vp*) 2 } + e 2 E {Quit?} E {Q^t) 2 } 

k=l i=l 

K 



+eJ2V^Wp(t) -B- Vp*) 2 }E{Q k (t) 2 } 



k=l 

However, because \ab\ < \[a 2 + b 2 } for all real numbers a, b, we have: 

^E{(Vp(t) - B - Vp*) 2 }E{Q k (t) 2 } < h{(Vp(t) -B- Vp*) 2 } + l -E {Q k (t) 2 } 

Thus: 

K K 

E{P(t) 2 } < E {(Vp(t) -B- Vp*) 2 } + e 2 J2Y.^ E {^(*) 2 > E {^^ 2 > 

fc=l i=l 

+| E E { (W) - 5 - w 2 } + 1 E E i^) 2 } 
fc=i fe=i 

< (1 + etf/2)E {(Vp(t) - 5 - V>* ) 2 } + (e 2 ^ 2 + eK/2)E {Q max (t) 2 } 

where we define Q max (t) 2 = max fcg { lj ... >K y Q k (t) 2 . It follows that there are finite constants Di, D 2 , D 3 such 
that: 

E {f3(t) 2 } <D 1 + D 2 E {p(t) 2 } + D 3 E {Q max (t) 2 } 
Because Q max (t) 2 < J2k=iQk(t) 2 , we have: 

K 

E {(3{t) 2 } <D ± + D 2 E {pit) 2 } + D 3 J2^{Qk(t) 2 } 

k=i 

Thus, from ([78|>-(T79]) we have: 

t=l t=l t=l k=l t=l 

which proves the result. □ 
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Appendix C — Proof of Theorem |7] parts (a) and (b) 

Proof: (Theorem [7] part (a)) The proof closely follows a similar result derived for exponential Lyapunov 
functions with deterministically bounded queue changes in [26]. From (l50l) we have: 

K 

E {L(Q(t + l))|Q(t)} < L(Q(t)) + B-eJ2 \Qk(t)\ 

fe=i 

Therefore: 

K 



E{||Q(t + l)|| 2 |Q(t)} < ||Q(t)|| 2 + B-eJ2\Qk(t)\ 



fc=i 

K 



< \\Q(t)\\ 2 + B - ^±^\Q k {t)\ 

< \m)\\ 2 + b - ^Lwam 

V "'max 

= ||Q(t)|| 2 + J B-4c||Q(t)|| 



where w max Amax-ke{i,-,K}Wk and c=ey/2/ (4:y/w max ). The third inequality above follows by (|38T ). Now 
suppose that ||Q(t)|| > B/{2c). It follows that: 

E{||Q(i + l)|| 2 |Q(t)} < ||Q(i)H 2 + B-2c\\Q{t)\\-2c\\Q{t)\\ 

< ||Q(t)|| 2 -2c||Q(t)|| 

< ||Q(t)|| 2 -2c||Q(t)||+c 2 

= (WQ(t)W-c) 2 

However, we have by Jensen's inequality: 

E{||Q(t + l)|||Q(t)} 2 <E{||Q(t+l)|| 2 |Q(t)} 

Therefore: 

E{||Q(t + l)|||Q(t)} 2 <(||Q(t)||-c) 2 

Assume now that ||Q(£)|| > max [B / (2c), c], so that we have both that ||Q(t)|| — c > and ||Q(t)|| > 
B/{2c). Taking square roots of the above inequality then proves that whenever ||Q(t)|| > max [B / (2c), c] 
we have: 

E{||Q(t+l)|||Q(f)}<||Q(f)||-c 

Defining a= max[JB/(2c), c] proves part (a). □ 
Proof: (Theorem fl\ part (b)) We have Q(t + 1) = Q{t) + d(t), where d(*)4(di(t), . . . , d K {t)). Define 
7 (t)A||Q(t+l)|| - \\Q{t)\\. Then | 7 (i)| < ||d(t)|| (by (EH)), and we have: 

IIQ(t + i)ll 4 = (HQ(*)ll+7(*)) 4 

= ||Q(t)|| 4 + 4||Q(t)|| 3 7 (t)+6||Q(t)|| 2 7(t) 2 

+4||Q(t)|| 7 (t) 3 + 7W 4 (80) 

However, note by part (a) that E{^(t)\Q(t)} < — c whenever ||Q(i)|| > a (for some constants c > 0, 
a > 0). Thus: 

4iiQwii*M t )igw} < { ^Zmm — " " ° 
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Hence: 

4||Q(t)|| 3 E{ 7 (t)|Q(t)} < -4c||Q(t)|| 3 + 4a 3 E{||d(t)|||Q(t)} + 4 C a 3 
Taking conditional expectations of (1801 ) and substituting the above yields: 

E{||Q(t + l)|| 4 |Q(t)} < ||Q(t)|| 4 -4c||Q(t)|| 3 + 4a 3 E{||d(t)|||Q(t)} + 4ca 3 

+6||Q(t)|| 2 E{||d(t)|| 2 |Q(t)} + 

4||Q(t)||E{||d(t)|| 3 |Q(t)}+E{||d(t)|| 4 |Q(t)} (81) 
Because \ \d(t)\ \ < gJ2k=i \ d k(t)\ (where g=^w max /2, with w max =max ke{1 ^, iK} w k ), we have: 

K K K K 

E{||d(t)|| 4 |Q(t)}<^^^^^E{|4(t)||^(t)||^(t)||^(t)||g(t)} 

k=i i=\ j=i 1=1 

However, by repeated application of Cauchy-Schwartz and the fact that E {dk(t) 4 \Q(t)} < D, we have: 

E{|4(*)||di(t)||di(<)||diWHO(0}<^ 

Thus: 

E{\\d{t)\\±\Q{t)}<g±K±D (82) 

Further, by Jensen's inequality: 

E{||d(t)|| 3 |Q(t)} < E{\\d(t)\\ 4 \Q(t)Y /4 <D^ (83) 
E{||d(t)|| 2 |Q(t)} < E{\\d(t)\\ A \Q(t)} 1/a <D 1 '* (84) 
E{\\d(t)\\\Q(t)} < EiWdm'lQ^^D 1 ^ (85) 
Substituting dill)-© into (ED yields: 

E{||Q(t + l)|| 4 |Q(t)}-||Q(t)|| 4 < -4c||Q(t)|| 3 + 4a 3 J D 1 /4 + 4ca 3 

+6| \Q(t) | \ 2 D^ 2 + A\\Q(t) ||£> 3/4 + D (86) 

Because the term — 4c||Q(t)|| 3 is the dominant term on the right-hand- side above (for ||Q(£)|| large), 
there must be a constant b\ > such that: 

-2c\\Q(t) || 3 + 4a 3 D 1/4 + 4ca 3 + Q\\Q(t) \\ 2 D 1/2 + 4| \Q(t) \\D 3/A + D < 

whenever ||Q(t)|| > b\. Thus, the right-hand- side of ( f86l) is less than or equal to — 2c||Q(t)|| 3 whenever 
\\Q(t) || > &i, and is less than or equal to 4a 3 D 1/4 + 4ca 3 + Qb\D 1/2 + 4biD 3/4 + D otherwise. It follows 
that there are constants 62 > 0, c > such that for all t and all Q(t) we have: 

E{\\Q(t + l)|| 4 |Q(t)} - ||Q(t)|| 4 < b 2 - 2c\\Q(t)\\ 3 

Taking expectations of the above yields: 

E{||Q(t+l)|| 4 }-E{||Q(t)|| 4 }<6 2 -2cE{||Q(t)|| 3 } 

Summing the above over t E {0, . . . , M — 1} and dividing by M yields: 

Rearranging terms and using the fact that ||Q(M)|| 4 > yields: 

M-l 



-y H\w)\n < ± + E{llo(0)l|4} < ^ + E{llo(0)l|4} 

lM ^ WM -'-2c 2cM ~2c 2c 

t=o 

This completes the proof of part (b). □ 
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Appendix D 

Lemma 4: Suppose {xj}?^ is an infinite sequence of non-negative real numbers such that there are 
constants C > and < 9 < 1 such that: 



M 



^Xi< CM 1+e VM e {1, 2,3,.. .} 



i=i 



Then: 



E 

Proof: For M e {1, 2, 3, . . .}, define 0(M) as: 



< oo 



/if 



^)=^E 

2=1 



c 



Then clearly: 

<1>(M)<J^ VMG {1,2,3,...} 
On the other hand, from the definition of 0(M) we have for all M e {1, 2, 3, . . .}: 



0(M + 1) = 0(M) TT ^U + XM+1 



(M + l) 2 (M+l) s 



So: 



Thus: 



0(M + 1) = 0(M) 



(M + l) 5 



M+l (M+l) 2 _ 
20(M) 



= 0(M+1)-0(M) + 
< 0(M+1)-0(M) + 



xm+i 
(M + l) 2 

0(M) 



M+l (M+l) 2 
20(M) 



M + l 

where the final inequality holds because 0(M) > 0. Summing the above over M e {1, 
positive integer G yields: 



G 

E 



; (M+l) 2 



G 



< 0( G + i)_0(i) + 2 ^ 



M=l 



0(M) 
M + l 



Because 0(1) = x%, rearranging the above yields: 

G+l ! 

Esl * HO + WEgg 



M=l 



M=l 



< 



c 



G 



*E 



C 



(87) 



G} for some 



(88) 



(G + l) 1 - 9 ' "^M^M+l) 

where (f8~8l) follows from (1871) . Because < 1, the first term on the right-hand-side of (f88l) goes to as 
G — > oo, and the second term is a summable series and hence is less than a bounded constant as G — > oo. 
Thus: 



G+l 



A/=l 



< OO 



□ 
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Appendix E — Proof of Theorem [3]on Rate Stability 

We prove Theorem [3] with the help of two preliminary lemmas. Let Q(t) be a non-negative stochastic 
process defined over t e {0, 1, 2, . . .}. Fix 5 > 0, and for each non-negative integer n define t n (5) by: 

t n (5)±\n 1+s ] (89) 

where \x] represents the smallest integer greater than or equal to x. The sequence {t n (5)}™ =0 is a (sparse) 
subsequence of the non-negative integers that increases super- linearly with n. Lemma [5] below shows that 
if E{Q(t) 2 } grows at most linearly with t, then Q{t) is rate stable when sampled over the subsequence 
{^n(5)},^ . We note that rate stability over this sparse sampling is not as strong as ordinary rate stability. 
This is because Q(t)/t may not converge to zero, even though it converges to over the sparse sampling. 
However, Lemma [6] below shows that rate stability over the sparse sampling, together with an additional 
second moment bound on changes in Q(t), is sufficient to ensure ordinary rate stability. 
Lemma 5: Suppose there is a finite constant C > and a positive integer t* such that: 

E{Q(£) 2 } < Ct Vt > t* 

Then for any 5 > 0, Q(t) is rate stable when sampled over the subsequence of times {t n (5)}%L . That is: 

urn — - — = (w.p.l) 

n-s>oo t n {0) 

Proof: Fix e > 0. It suffices to show that: 



lim Pr[U n>M {Q(t n (5))/t n (5) > e}] = (90) 

M — >oo 



To this end, note by the Markov inequality that for any slot t > t*: 

Pr[Q(t)/t > e] = Pr[Q{tf > eH>] < < ^ 

Substituting t = t n (8) into the above inequality (assuming that t n (5) > t*) yields: 



Pr[Q(t n (S))/t n (5) > e] < < ( 



e 2 t n {5) ~ e 2 n^ l + & ) 

Therefore, by the union bound, we have for any positive integer M such that £m(^) > t* 



< Pr[U n > M {Q(t n (5))/t n (5) >e}] < Pr[Q(t n (S))/t n (5) 



> e 



n=M 
oo 



X ^ C 

- 1^ e 2 n (l+S) < °° 
n=M 

Thus, the probability on the left-hand-side of the above chain of inequalities is bounded by the tail of a 
convergent series, and so (|90| ) holds. □ 
Lemma 6: Suppose there is a finite constant C > and a positive integer t* such that: 

E{Q(i) 2 } < Ct Vt > t* 

Further suppose there is a finite constant D > such that for all t G {0, 1,2,.. .} we have: 

E{(Q(t + l) -Q(t)) 2 } < D 

Then Q(t) is rate stable. 
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Proof: Fix a value 5 such that < 5 < 1 and 5+ (3/4) (1 + 5) < 1. By Lemma [5] we know that 
Q{t) is rate stable when sampled over times {t n (5)}^L , where t n (5) is defined in (|89l) . For simplicity of 
notation, below we write "t n " in replacement for "i n (<5)." Thus, t„4 |~n( 1+<5 )] , and: 

lim ^ = (w.p.l) 
Now note by the Markov inequality that for all t > 0: 

Pr[|Q(f + 1) - Q(f)| > t 3/4 ] = Pr\m + 1) - Q(^)) 2 > t 3/2 } < ^ 
Thus, for any integer M > 0: 

oo 

i.. j- n _ o^i > / 3 / 4 n < 

£3/2 



Pr[U t > M {|Q(t + 1) - Q(*)| > t 3/4 }] < E ^72 < 00 



Thus: 

lim Pr[U t > M {|Q(* + 1) - Q(t)\ > t 3/4 }] = 



M-5-oo 



It follows that, with probability 1, there is some positive random integer K such that \Q(t+\)—Q(t)\ < i 3 / 4 
for gSlt>K. 

Now for any integer t > 0, define n(i) as the integer such that t n u\ < t < tn(t)+i - Then for any t > 
such that t n ( t ) > K, we have: 

< Q(t n (t)) + [t n (t)+l - *n(t)]<^(t) + i 

< Q(f B (t)) + - tn(t)][(n(t) + l)( 3 / 4 )(^) + 1] 

Thus: 

< Q(*n(t)) + [tn(t) + l ~ t w( t)][(n(t) + lffW+g + 1] 
* ~~ tn(t) 

On the other hand, for any n > we have by a Taylor expansion 

tn+i < l + (n + l) 1+s 

< i + n i+* + (i + 5 y + (i+iM n *-i 

< a + + (1 + 5)n 5 
where a=l + (1 + 5)5/2. Thus, for any n(t) > we have: 

*»(t)+l - *n(t) < *n(t)+l " ™(*) 1+4 < a + (1 + 5)n(t) 5 

Using this in (l9TT) yields: 



0< QW < g(tn W ) + [a + (1 + <?)n(t)*][(n(t) + ljg^) + i] 

* *n(t) 

g(W)) alKtl + lf/^ + l] (l + ^)n(t)^[(n(t) + l)( 3 / 4 )( 1+ ^ + l] 

Taking limits and using the fact that <3(i n (t))An(t) — >■ with probability 1, and the fact that 5 + (3/4) (1 + 
5) < 1, yields: 



< lim Q(t)/t < (w.p.l) 

t— >OD 



□ 
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We now prove Theorem [3j Let Q(t) = (Qi(t), . . . ,Qk{£)) be a stochastic vector defined over t G 
{0, 1,2, . . .}. Assume Q(t) has real-valued entries. Define the quadratic Lyapunov function L(Q{t)) as 
in (ITTT) and define the drift A (?{(£)) as in (fT2l) . Suppose there is a finite constant B > such that for all 
r G {0, 1, 2, . . .} and all possible %(t), we have: 

A(H(r)) < B (94) 

Proof: (Theorem [3] part (a)) Assume that E {L(Q(0))} < oo. Fix a slot r > 0. Taking expectations of 
(1941) yields: 

E{L(Q(t + 1))}-E{L(Q(t))}<B 
Summing the above over r G {0, 1, . . . , t — 1} for some integer i > yields: 

E {L(Q(t))} - E {L(Q(0))} < Bt 
Substituting the definition of L{Q(t)) in (fTTI) into the above inequality yields: 



l -Y^ Wi M{Q k {t) 2 } <Bt + E{L(Q(0))} (95) 

fc=i 

It follows from ([95]) that for each fc G {1, . . . , iT}: 

E{|q tW |}'<E{q t («)'}< g«±^Wg™ ( 96) 

W k 

and so: 

E{|Qfe(*)|} < V2Bt/w k + 2E {L(Q (0))} /w k 

Dividing the above by t and taking limits as t — > oo shows that Qfc(t) is mean rate stable, proving part 
(a). '~ ^ " □ 

Proof: (Theorem [3] part (b)) First assume that Q(0) is a given finite constant (with probability 1), so 
that E {L(Q(0))} = L(Q(0)). We have from @ that for all t > 1 and all jfe G {1, ... , K}: 

v{Q*m< [2B+mQ(m 

w k 

Furthermore, it can be shown that E{(Q k (t + 1) - Qk{t)) 2 } < D implies E{(\Q k (t + 1)| - |Qfc(t)|) 2 } < 
Thus, the conditions required to apply Lemma [6] hold (using = \Qk(t)\, t* — 1 and C = 

[2B + 2L(Q(0))]/wk). Then Lemma [6] ensures \Qk(t)\ is rate stable for all k G {1, . . . ,-^}, and hence 
Qk(t) is rate stable for all k G {1, ... , if}. The above holds whenever the initial condition Q(0) is any 
given finite constant, and hence it holds whenever Q(0) is finite with probability 1. □ 
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